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The Stochastic Series Expansion method (SSE) is a Quantum Monte Carlo (QMC) technique 
working directly in the imaginary time continuum and thus avoiding "Trotter discretization" errors. 
Using a non-local "operator-loop update" it allows treating large quantum mechanical systems of 
many thousand sites. In this paper we first give a comprehensive review on SSE and present 
benchmark calculations of SSE's scaling behavior with system size and inverse temperature, and 
compare it to the loop algorithm, whose scaling is known to be one of the best of all QMC methods. 
Finally we introduce a new and efflcient algorithm to measure Green's functions and thus dynamical 
properties within SSE. 
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I. THE SSE TECHNIQUE 

Since their first formulation in the early eig htieJi 
Quantum Monte Carlo (QMC) methods have become 
one of the most powerful numerical simulation techniques 
and tools in many-body physics. The first QMC algo- 
rithms were based on a discretization in imaginary time 
("Trotter decomposition") and used purely local update 
steps to sample the system's statistically relevant states. 
These methods require a delicate extrapolation to zero 
discretization in order to reduce systematic errors. Fur- 
thermore, the purely local updates often proove incapable 
to traverse the accessible states in an efficient way: au- 
tocorrelation times grow rapidly with increasing system 
size. 

A more recent dass of QMC algorithms, the so-called 
"loop algorithms"lj 113, use non-local cluster or loop up- 
date schemes, thus reducing autocorrelation times by sev- 
eral orders of magnitude in some cases. Unfortunately, it 
is often highly non-trivial to construct a loop algorithm 
for a new Hamiltonian, and some important interactions 
cannot be incorporated into the loop scheme. These in- 
teractions have to be added as a posteriori acceptance 
probabilities after the construction of the loop, which 
can seriously decrease overall efficiency of the siatulation. 
Some loop algorithms also suffer from "freezing" B'E£l when 
the probability is high that a certain type of cluster oc- 
cupies almost the whole system. 

These insufficiencies can be overcome using the "sto- 
chastic series expansion" (SSE) approach together with a 
loop-type updating scheme (see Ref. ^ and earlier works 
referenced therein). 

• SSE is (almost) as efficient as loop algorithms on 
large systems. 

• It is a numerically exact method without any dis- 
cretization error. 



• It is as easy to construct and general in applicability 
as world- line methods. 

Following SandvikB^^I we briefly outline the basic 
ideas of SSE now. The central quantity to be sampled in 
a QMC simulation is the partition function 



Z = Tr(e-'5^), 



(1) 



where H is the system's Hamiltonian and (3 — ^UT the 
inverse temperature. Standard QMC techniqucsO split 
up the exponential into a product of many "imaginary 
time slices" e~^'^^ and truncate the Taylor expansion 
of this expression after a certain order in At, thereby 
introducing a discretization error of order Ar". In SSE, 
however, one chooses a convenient Hilbert base { | a ) } (for 



example the eigenbase { | a ) } = 
and expands Z into the power series 



Sf, S2, Sf^ )}) 



(2) 



The statistically relevant exponents of this power series 
are centered around 



(n) oc iV,/3, 



(3) 



where Ng is the number of sites (or orbitals) in the sys- 
tem. (This follows from Eq. ([ll| ) and from (E) cx Ng.) 
We can thus truncate the infinite sum over n at a finite 
cut-off length L oc Ns(3 without introducing any system- 
atic error for practical computations. The best value for 
L can be determined and adjusted during an initial ther- 
malization phase of the QMC simulation: beginning with 
a relatively small value of L one can start the QMC up- 
date process, stop it whenever the cut-off L is exceeded 
and continue with L increased by 10. ..20%. 

Now let H be composed of a certain number of el- 
ementary interactions involving one or two sites (such 
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as on-site potentials, nearest neighbor hopping etc.). In 
order to obtain a uniform notation we combine those in- 
teractions affecting only one site to new "bond" interac- 
tions. (One can, for example, take two chemical potential 
terms /i • n(sitel) and /i • n(site2) and form the bond term 
^/i(ri(sitel) -|-n(site2)) with the constant C assuring that 
the sum over all new bond terms equals the sum over all 
initial on-site terms.) We can thus assume in the follow- 
ing that iJ is a finite sum of "bond" terms Hb and that 
the operator strings i?" in (^) can be split into terms of 
the form 



(a. 



(4) 



where bi labels the bond on which the elementary interac- 
tion term operates and Ui the operator type (e.g. density- 
density interaction or hopping). By introducing "empty" 
unit operators 7^*^°) = id one can artificially grow all op- 
erator strings to length L and obtainll3 



" {Sl} 



(5) 



Here {Sl} denotes the set of all concatenations of L bond 
operators number of non-unit operators 

in^L. 

If we want to sample the (a, S'l) according to their 
relative weights with a Monte Carlo procedure we have 
to make sure that the matrix element of each bond op- 
erator is zero or negative since in order to fulfill detailed 
balance we choose the acceptance probability p of a bond 
interaction to be proportional to its negative matrix el- 
ement. This requires however that all matrix elements 
be non-positive. Does a simple redefinition of the zero 
of energy help? For the diagonal operators we can in- 
deed add the same negative constant C to each of them 
without changing the system's properties, and thus make 
all matrix elements negative or zero. Unfortunately, for 
the non-diagonal terms an equally simple remedy does 
not exist. If one can show, however, that such a non- 
diagonal operator must appear pairwise for the matrix 
element to be non-zero, its matrix element can be multi- 
plied by —1 without changing the physics of the system. 
(This corresponds to a gauge transformation on all lattice 
sites with odd parity.) On non- frustrated lattices this 
trick is widely applicable, which considerably increases 
the set of Hamiltonians suitable for SSE. If there are valid 
world-line configurations carrying an odd number of non- 
diagonal vertices with positive matrix elemet - which is 
typical for Hamiltonians and lattices with frustrations - 
only the conveptional approach of dealing with the sign 
problem helpaa one simulates a new system with the 
acceptance probabilty p' — \p\ and obtains the estimate 
of a physical quantity Q in the form 



(Q> = 



(Qsignp) 
(signp) 



Unfortunately, (signp) tends to zero exponentially with 
increasing system size Ng and inverse temperature (3, so 
that the computation time needed to achieve a certain 
accuracy exponentially increases with Ns(3 and the prac- 
tically accessible range of system sizes and temperatures 
is rather limited. 



II. LOOP UPDATES 

Having outlined the basic idea of SSE we review 
the non-iacal updating updating scheme proposed by 
Sandvik.EJ In the following figures we illustrate the pro- 
ceeding by means of a simple physical model: a system 
of two types of hard-core bosons on a 6-site chain with 
periodic boundary conditions and Hamiltonian 

H = -tY, Y,v[ai^,a^^,+^+H.c. 

Q=l,2 j: 



Q=l,2 i 



(6) 



with 



(7) 



The creation operator ^ creates a hardcore boson of 
type a = 1 or 2 on site i. The first term [t) is a nearest 
neighbor hopping term, the second term (/ia) the chem- 
ical potential and the third term (t^q,) pair creation and 
annihilation. The projection operator V implements the 
hard core constraints between the two types of bosons. 
In the world-line representation - in which the horizon- 
tal axis represents the spatial dimension and the vertical 
axis the propagation level Z = 1...L - we symbolize type-1 
bosons by single solid lines, type-2 bosons by double lines 
and empty sites by dotted fines (see Fig. p. 

Sandvik separates the set of all bond operators into 
three classes: empty operators H''^\ diagonal operators 
iJ^''^ and non-diagonal operators H'^"-'^\ The QMC pro- 
cess starts with an arbitrarily chosen initial state | a ) and 
an empty operator string: in Fig. |l|, for example, three 
sites are occupied with type-1 bosons, two sites are empty 
and on site 2 is occupied by a type-2 particle. Now two 
different update steps are performed in alternating order: 
a diagonal update exchanging empty and diagonal bond 
operators and an operator loop update transforming and 
exchanging diagonal and non-diagonal operators. 

In the diagonal update step the operator string posi- 
tions I = 1...L are traversed in ascending order. If the 
current bond operator is a non-diagonal one it is left 
unchanged; if it is an empty or diagonal operator it is 
replaced by a diagonal or empty one with a certain prob- 
ability satisfying detailed balance (i.e. an operator with 
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final state 



(-) (1) (2) (-) (1) (1)> 
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Start State | (-) (1) (2) (-) (1) (1)) 

FIG. 1. world-line representation of an arbitrarily chosen 
start state for a physical system with three allowed occupa- 
tions per site: empty (dashed line), particle 1 (solid line) or 
particle 2 (double line). The initial cut-off length L has been 
set to L = 9, and the initial bond operator string consists 
only of "empty" operators. 



propagation step 

9-' 



FIG. 2. In the diagonal update step a certain number 
of empty bond operators is replaced by diagonal ones (and 
vice versa). In this example seven of the initial nine identity 
operators have been replaced. 



lower energy is more likely to be maintained or inserted 
than an operator "witi, higher energy) (Fig. |^). 
FoUowing Sandviktil we use the notation 



(a.) I 



(8) 



for the state obtained by acting on | a ) with the 
first I bond operators and | ab{l) ) for the restriction 



of I a{l) ) to the bond h. Let M be the total num- 
ber of interacting bonds on the lattice. Then the de- 
tailed balance conditions for the diagonal update read 



P(H(o)(/)->i7('^'(0) 



^ M(3{ab{l)\Hf^\ab{l)) 
L ~ n 



min ( 1 , 



(9) 



L~n+1 



MP{ab{l)\H^^^\ab{l)] 



Non-diagonal bond operators cannot simply be in- 
serted into the world line configuration as diagonal oper- 
ators can: their insertion and modification requires local 
changes of the world-line occupations. We discussed ear- 
lier in this paper that concatenated local changes along a 
closed path (or loop) through the network of world-lines 
and interaction vertices are much more efficient than in- 
dependent purely local changes. Sandvik proposed the 
following method to construct such a loop: a certain 
world-line and a propagation level / on it is chosen ar- 
bitrarily; at the chosen point one disturbs the world-line 
by a local change - for example the creation or annihi- 
lation of a particle. Then one chooses a direction (up 
or down in propagation direction) and starts moving the 
disturbation in this direction (Fig. ^). The aim is to 
move this disturbation (we will call it "loop head" in the 
following) through the network of world-lines and inter- 
action vertices until the initial discontinuity is reached 
again and healed up. 



propagation step 
9 



bounce 




FIG. 3. In the operator loop update step a local change 
is inserted on a world-line and then moved through the 
world-lines and vertices. At each vertex a new direction is 
chosen such that the probability of a path is proportional to 
the negative energy of the resulting interaction vertex (de- 
tailed balance). 

Whenever the loop head reaches an interaction vertex 
we must decide how to go on; in the situation shown in 
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propagation step 

9-' 



FIG. 4. The loop update closes if the initial insertion point 
is reached again and the inserted world-line discontinuity thus 
removed. 



Fig. 1^ the path "bounce" is always possible since it results 
in an unchanged vertex. The path "straight" results in 
a diagonal vertex, and the path is possible if the matrix 
element of that vertex is nonzero. The path "turn" is 
only allowed if the Hamiltonian contains nearest neighbor 
hopping terms for particle type 2, while path "jump" 
is forbidden unless the Hamiltonian also allows for pair 
creation of particle type 2. The choice among the allowed 
paths must again satisfy detailed balance. 

In our model - in which both pair creation and hop- 
ping are allowed - we might end up with the series "turn" , 
"jump" , "turn" , "turn" of path choices, after which the 
starting point is regained and the world-line discontinu- 
ity healed up (Fig. ^). The overall result of this loop is 
that we have replaced 4 diagonal interactions by 4 non- 
diagonal interactions (marked "n.d." ) in Fig. ^. 

Sandvik's method implicitly assumes that running 
with a world-line change into an interaction vertex al- 
ways requires choosing an outgoing leg and a change on 
it and continuing the loop. But what if the encircled ver- 
tex in Fig. m with three empty legs and one leg occupied 
by particle 2 is also a valid vertex? Then we have to add 
a fifth possibility to the list of allowed path choices: the 
"stop here" . If this last alternative is chosen the loop has 
reached a dead end. In this case our SSE code terminates 
the loop here, goes back to the starting point and moves 
in the opposite direction until either another dead end is 
encounterd or the starting point is reached again and the 
initial discontinuity is healed up. 

From Fig. || one can see that when choosing a path 
through the current vertex there is always the possibility 
to undo the current change on the incoming leg of the 
vertex and to "bounce" backwards on the same leg. This 
path choice is normally not very helpful since it means 
one step backwards in the construction of the current 
loop. Fortunately, all "bounce" paths can be suppressed 



without violating detailed balance if on each bond all 
nonzero matrix elements are equal, or can be made equal 
after a suitable energy shift of the diagonal vertices. As 
an additional benefit, without the "bounce" path the al- 
gorithm becomes equivalent to the loop algorithms. For 
each vertex a path can be chosen according to detailed 
balance, after which the loop construction becomes de- 
terministic. All the Heisenberg models studied in section 
[V are examples for this class of "optimizable" physical 



systems. 

A further improvement of the update scheme is possi- 
ble in the limit of high temperatures, i.e. /3 — > 0. Equa- 
tion (^) tells us that the average number of (non-empty) 
vertices is rather small in this situation, and a large part 
of all world-lines is not connected to any vertex at all. 
The loop update will not be very efficient here since it 
essentially needs a sufficient number of vertices intercon- 
necting the world-lines. For this reason our SSE code ad- 
ditionally performs a so-called "free world-line update" 
on each world line carrying no vertex at all. In this up- 
date the occupation of the entire world-line is changed to 
a randomly selected new occupation. 

We have stressed several times that all the local path 
choices satisfy detailed balance. What remains to be 
shown is that the updating mechanism is ergodic in the 
grand canonical ensemble, i.e. that all bond operator 
strings Sl and all states | a ) can be reached. In or- 
der to demonstrate that we remind the reader that loops 
crossing the boundary between first and last propagation 
level I modify the initial state | a ) for the next update 
cycle. Therefore, the loops sample not only Sl but also 
I a ) , and starting from a completely empty system any al- 
lowed configuration can be generated by a series of loops 
traversing one entire world-line each. 

Numerical tests of the loop-update mechanism de- 
scribed above show that for large system sizes and if there 
are elementary interactions with very different energy 
scales, the loop construction sometimes gets stuck and 
the loop head does not find its way back to the starting 
point even after millions of steps. In order to avoid this 
trapping loops that exceed a critical length are aborted 
and the original state of the vertices is restored. This 
causes no systematic errors for measurements done be- 
tween loop updates as detailed balance is not violated. 
The measurements of Green's functions G(r), however, 
which are performed "on the run" during loop construc- 
tion (see Sec. are biased if large loops are thrown 
away. Since the large loops are more likely to reach re- 
gions of the systems far away from the starting point than 
short loops, the values of G{r) for large distances r are 
systematically under-estimated if a considerable amount 
of large loops is aborted. Hence the total number of 
aborted loops has to be checked before one can trust in 
the recorded Green's functions. 
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III. MEASUREMENTS 

Efficient estimators for many static observables witliin 
the SSE mechanism have been derived by SandvikHj 

• AU observables iJ^"^ appearing as elementary in- 
teractions in the system's Hamiltonian can be mea- 
sured very easily by counting the corresponding in- 
teraction vertices in the bond operator string Sl'- 
if Sl contains on average {N{a)) such vertices one 
obtains 



{Hi'^))=-l{N{a)). 



(10) 



Summing over all elementary terms iJ^"^ gives an 
estimator for the internal energy E: 



E 



(11) 



Instead of working in a representation with varying n a 
fixed string size L can be chosen, as the identity ver- 
tices are uniformly distributed and do not influence the 
mapping from index to imaginary time. 

The corresponding generalized susceptibilities can be 
calculated straight forward by integrating {D2{t)Di{0)) 
over T, 



Xl2 



{D2{t)Di{Q)) dr. 



which gives I 



Xl2 








(n+l) 



(16) 



(17) 



where n is the number of non-empty interaction 
vertices in Sl- (This equation can be derived very 
easily from (E) — -^InZ.) 

• For the heat capacity Cv we additionally have to 
measure the fluctuations of n: 

Cv = {n^)^{nf-{n). (12) 

• Equal time correlations of two diagonal operators 
Di and D2 can be measured via 

= (^-^f^d^ind^iny (13) 

where d^[l] = {a{l)\D^\a{l)) . 

Are there equally efficient estimators for time-depen- 
dent observables? In SSE the propagation index I de- 
scribes the evolution of an initial state when a series of 
elementary terms of the Hamiltonian is acting on it; thus 
I plays a role analogous to imaginary time in .-a. stan- 
dard path integral. More detailed calculationsEj show 
that an imaginary time separation r corresponds to a 
binomial distribution of propagation distances Al; the 
time-dependent correlation (I?2(t)£'i(0)), for example, 
is related to the correlator 

Ci2(A0 = ^Vd2[^ + AZ]di[;] (14) 

via 

(15) 



IV. SCALING BEHAVIOR 

One decisive criterion for the performance of a QMC 
simulation technique is the behavior of computation time 
C as a function of system size Ng or inverse temperature 
f3] To facilitate a hardware-independent measurement of 
C and a comparison to other QMC techniques we de- 
fine C as the number of elementary update operations 
needed to transform a given state |q!^"^) into an new state 
|q,("+i)^ in such a way that the mean autocorrelation time 
T is equal to 1. In SSE the number of elementary update 
operations is the number of diagonal vertices tested for 
replacement plus the number of vertices traversed during 
the loop update. In the following we compare SSE to 
the loop-algorithm, which is known to show an excellent 
scaling behavior for many benchmark problems. As test 
models we choose isotropic antiferromagnetic Heisenberg 
models in one, two and three dimensions with up to 4096 
sites and (3 up to 64 in a vanishing or finite external mag- 
netic field. 

Following Ref. ^ we describe the scaling behavior of 
the two algorithms by means of the dynamical exponent 
z defined from 

T-C(x(3-l^F. (18) 

Here, r • C is the computational effort (i.e. the number 
of elementary update steps) needed to achieve a mean 
autocorrelation time of t = 1 for the measurements of 
the studied quantity; D is the spatial dimension of the 
simulated system and I = \/]V7 its length in each di- 
mension. From Table | we see that both simulation tech- 
niques show an approximately equal performance and an 
very good scaling behavior: since the ratio Ct/{I3Ns) is 
approximately constant we obtain z « in both cases. 

Next we enlarge the square lattice into the third spatial 
dimension and examine a bilayer quantum Heisenberg 
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TABLE I. 2D antiferromagnetic Heisenberg model at van- 
ishing magnetic field h = 0: calculation of the uniform 
magnetic susceptibility (x) = ^^f^ | from QMC simula- 
tions with 1000000 (120000 in the case /3 = L = 64) up- 
date-measurement cycles (left: SSE, right: loop-algorithm). 
C • r is the number of elementary update operations per cycle 
needed to achieve a mean autocorrelation time t = 1 for the 
measurements of x- 



TABLE III. ID chain antiferromagnetic Heisenberg model 
in a magnetic field h: calculation of magnetization M 
from QMC simulations with 1000000 update-measurement 
cycles (left: SSE, right: loop-algorithm) for a system with 
P — Ns = 16. C ■ T is the number of elementary update op- 
erations per cycle needed to achieve a mean autocorrelation 
time T = 1 for the measurements of M. 



SSE 







C-T 


SSE 


C-r 


Loop 


h/J 


l3-h 


C-T 


M 


C-T 




■Ns 






Co-To 




Co-To 




X 




X 


0:02 


0.32 


1.00 


0.008(2 ±5) 


1.00 


4 




1.00 


0.040(46 ± 16) 


1.00 


0.040(20 ± 10) 


0.04 


0.64 


1.02 


0.016(4 ±6) 


0.97 


8 


8^ 


0.61 


0.044(83 ± 15) 


0.90 


0.044(92 ± 8) 


0.1 


1.6 


1.22 


0.057(8 ± 8) 


1.84 


16 


16^ 


0.40 


0.044(72 ± 12) 


0.56 


0.044(68 ± 6) 


0.2 


3.2 


1.98 


0.24(4 ±2) 


7.55 


32 


32^ 


0.40 


0.044(19 ± 11) 


0.53 


0.044(24 ± 6) 


0.4 


6.4 


1.37 


0.893(8 ± 9) 


167.40 


64 


64^ 


0.36 


0.044(01 ± 23) 


0.42 


0.044(07 ± 14) 


1.0 


16.0 


0.86 


2.069(0 ± 8) 


2338.74 



Loop 



M 



0.0083(7 ± 7) 
0.017(6 ±2) 
0.057(7 ± 5) 
0.24(3 ± 2) 
0.89(4 ±3) 
2.(24 ± 12) 



TABLE IL Square bilayer antiferromagnetic Heisenberg 
model at vanishing magnetic field h = and at the quantum 
critical point {J±/ J = 2.524): calculation of the uniform mag- 
netic susceptibility (x) from QMC simulations with 1000000 
(390000 in the case (3 = L = 32) update-measurement cycles. 











SSE 




Loop 




P 




C-T 




C-T 






P-Ns 


X 




X 


4 


2 


-e 


1.00 


0.0115(6 ±7) 


1.00 


0.0114(6 ±5) 


8 


2 


■8^ 


0.96 


0.0068(2 ±6) 


1.03 


0.0069(2 ± 2) 


16 


2 ■ 


16^ 


0.68 


0.0036(8 ±4) 


1.20 


0.0036(6 ± 2) 


32 


2 ■ 


32^ 


0.56 


0.0018(5 ±3) 


1.20 


0.0018(3 ± 1) 



antiferromagnet at the quantum critical point separating 
the spin gap phase from the magnetucally ordered one.t£l 
Our aim is to measure scaling behavior and dynamical 
exponents exactly at this quantum critical point. This 
point is of particular interest since the immediate neigh- 
borhood of a phase transition often leads to the so-called 
"critical slowing down" of QMC simulations, i.e. explod- 
ing autocorrelation times and thus a dramatic decrease 
of efficiency of the QMC update process. 

The results in Table || show that the scaling behavior 
for both algorithms is still almost linear in (3Ns. The 
scaling for SSE looks slightly superior to the loop algo- 
rithm. This difference can most probably be attributed 
to the fact that improved estimators were used in the 
loop algorithm simulation, leading to slightly smaller er- 
rors but larger autocorrelation times. There is no sign of 
critical slowing down in either algorithm. 

As we have mentioned in the introduction one of the 
major advantages of SSE is that external potentials (and 
magnetic fields in spin models) can be included without 
a loss of performance. To verify this assertion we now ex- 



amine the antiferromagnetic Heisenberg model on a chain 
and a square lattice in a finite magnetic field h ^ 0. For 
the loop-algorithm we expect to find a rapidly increas- 
ing autocorrelation time and decreasing performance if 
the product of magnetic field h and inverse temperature 
is much larger than 1. This is due to the fact that the 
external field is incorporated into the loop-algorithm via 
a-posteriori acceptance probabilities for each constructed 
loop, for ph ^ 1 these probabilities are still large, 
whereas at /3/i « 1 they begin to decrease considerably. 

Indeed, the numerical results in Table III demonstrate 
that at (3h « 10 the loop algorithm cannot be used any 
more because the autocorrelation times get too long. For 
SSE, on the contrary, we do not expect any negative ef- 
fect by introducing a magnetic field whose strength is of 
the order of the other elementary interactions, h ~ J, 
since no a-posteriori acceptance decision is neccessary. 
We rather presume that performance is slightly worse 
for h/J ^ 1 because there are elementary interaction 
vertices with very different energy scales. Both predic- 
tions are verified by the data in Table III. For weak 
fields ft. <C J it might be preferable to construct loops in 
zero field, and to introduce the field via an a-posteriori 
Metropolis decision on whether to accept loops which 
change the magnetization, as it is done in the loop algo- 
rithm. 

For sake of completeness we also show the correspond- 
ing data for the 2-dimensional Heisenberg model in Table 
IV. The results demonstrate that the different behavior 



of SSE and loop-algorithm described in the one dimen- 
sional case is even more severe in two dimensions: the 
loop- algorithm loses its efficiency already at (3hK, 1.5. 

In some cases other performance measurements are 
more interesting. One could ask how the computation 
time till a certain accuracy in a certain measured vari- 
able is reached scales with j3 and Ns- This is studied in 
Fig. ||. For the square lattice Heisenberg antiferromagnet 
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TABLE IV. Square lattice antiferromagnetic Heisenberg 
model in a magnetic field h: calculation of magnetization 
M from QMC simulations with 1000000 update-measurement 
cycles for a system with inverse temperature (iJ = 16 and 
Ns — 16^ lattice sites. C ■ r is the number of elementary 
update operations per cycle needed to achieve a mean auto- 
correlation time T = 1 for the measurements of M. 



SSE 



Loop 



h/J 


P-h 


C-T 

Co-ro 


M 


C-T 
Co-To 


M 


0:02 


0.32 


1.00 


0.22(4 ±8) 


1.00 


0.231(3 ±3) 


0.04 


0.64 


0.83 


0.47(9 ± 8) 


1.38 


0.477(7 ± 7) 


0.1 


1.6 


0.94 


1.41(0 ± 9) 


5.61 


1.42(2 ± 2) 


0.2 


3.2 


0.44 


3.47(0 ± 7) 


34.25 


3.48(6 ± 7) 


0.4 


6.4 


0.18 


7.73(7 ±4) 


1691.66 


7.7(8 ± 7) 


1.0 


16.0 


0.12 


22.10(6 ±4) 







we trace the computation time to reach an accuracy of 
4 digits in energy as a function of (3 (Fig. || top) and Ns 
(Fig. H bottom). The exponents k(/3) in C oc Z?"'*^^ and 
K(Afs) in C oc p'^'^^^'^ derived from Fig. ^ are 

K(p) = 0.34 ±0.05, 
K(N,) = 0.48 ±0.05. 

Both quantities are smaller than 1, and Eq. (|8|) would 
return a negative dynamical exponents z. This is due 
to self-averaging: in a large system local fluctuations of 
a physical observable around its mean value on different 
subregions of the lattice can compensate and average out 
each other, thereby lowering the observable's measured 
variance. The computational effort needed to get ther- 
modynamical averages to a certain relative error scales 
sublinearly with system size and inverse temperature, so 
that systems of several thousand sites or at temperatures 
of not more than O.OOIJ can be simulated within minutes 
or a few hours on a standard PC or workstation. 



V. GREEN'S FUNCTIONS 







■ 






The observables listed listed in section [II serve to ac- 
cess important static thermodynamic properties of the 
studied system. However, properties such as photo emis- 
sion (at(k,tj) a(o,o)) or spin flip (S'~(k,Lj) S'^(Ofl)) are often (, 
even more interesting as they provide insights into the 
system's dynamics. Within the framework of SSE mea- 
suring these Green's functions G{\i., oj) requires the in- 
sertion of local changes on certain world-lines (such as 
removing a particle at propagation level li on world- line 
wi and re-inserting it at propagation level I2 on world- 
line W2)- Performing these insertions is a highly non- 
trivial task since on the one hand detailed balance must 
be assured, on the other hand the whole process has to 
sample all distances r = t«2 — wi and all propagation 



FIG. 5. Scaling behavior of computation time C to reach 
a relative accuracy of 10~^ in the measured energy of the 2D 
AF Heisenberg model. On top: ln(C) versus ln(/3J) for lOx 10 
sites. Below: ln(C) versus In(A^s) for pj = 10. The time was 
measured in seconds on a DEC workstation. 



differences Al = I2 — h efhciently. Both requirements 
are already fulfilled by the loop update steps. Since this 
update inserts and moves local changes on the network 
of world-lines and connecting interaction vertices it can 
be used to record the corresponding Green's functions 
G{r, Al) "on the fly" while constructing the loop update. 
As an example we reconsider the hard-core boson model 
from section Ol and in particular the operator loop shown 
in Figs. ^ andjj which starts with the removal of a type-2 
particle on propagation level 6 of world-line 2; our cut-off 
power in the series expansion was L = 9, and previous 
diagonal updates have produced n = 7 "non-identity" 
interaction vertices. 

Taking level 6, the starting point of the loop, as 
zero point for the propagation direction we are now 
able to measure quantities of type {a\{r,Ai) a2iofi)) and 
02(7-, A/) 02(0,0)) during the construction of this loop. 
Fig. shows that for AZ = exactly 2 measurements 
of (a|(r,A/=o) a2(o,o)) and one of {a\{r,Ai=o) 02(0,0)) can be 
performed during the loop: one at the start (or end) 
of the loop at distance r = 0, two on adjacent world- 
lines (r — 1) while moving down (right) and up (left). 
The recorded value at each measurement is the prod- 
uct of the matrix elements of the creation/annihilation 
operators inserted at the open ends of the loop un- 
der construction. We denote the state at propagation 
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propagation step 
9 




start level 

(a^(r=-i) a2(0)y 



FIG. 6. The loop update constructed in Figs. ^ and |^ 
can be used to record measurements of the Green's function 
{al(r,Ai) 02(0,0)) and {a\(r,Ai) 02(0,0)) where r is a distance be- 
tween world-lines (or sites) and Al is a propagation level dif- 
ference. For sake of clarity only the measurements for Al = 
are explicitly marked in the figure. 



level 6 in our example before inserting the two cre- 
ation/annihilation operators as |q;(6)) and the state af- 
ter insertion of the operators as |a(6)). Then the A/ = 
matrix element (oj (1=1, Ai=o) 02(0,0)) - measured when the 
loop head moves down on world line 3 - is 

(a2(r=i,Ai=o) 02(0,0)) = ((5(6)1 a2(i-=i,Ai=o) 02(0,0) I a(6)) . 

Stepping down by one more propagation level on world 
line 3 we can record the Al^O matrix element 

(4(r=l,Ai=-l) 02(0,0)) 

= (a(5)|aJ(r=i,Ai=-i)|a(5)) (<5!(6)| 02(0,0)] Q!(6)) . 

Leaving our hardcore boson example behind and return- 
ing to the general case we conclude this paragraph with 
the remark that for the creation or annihilation of a 
fermion the recorded matrix elements are always equal 
to 1, while they can adopt other values for spin flips or 
the creation/annihilation of bosons. 

Having measured and recorded the quantities G{r, Al) 
(or a correlation function C{r,Al) = {D2{r,T)Di{0,0))) 
we still have to perform a couple of non-trivial transfor- 
mation steps till we obtain the desired quantities G{k, uj) 
and C{k,oj) which describe the dynamical response of 
the system to external perturbations. First we have to 
relate propagation levels Al to imaginary times r, then 
a Fourier transform brings us from r-space to /c-space; fi- 
nally we need an inverse Laplace transform to step from 
imaginary time r to exitation energy lo. 



VI. EFFICIENTLY ACCESSING THE SYSTEM'S 
DYNAMICS 

In this section we will discuss efficient implementa- 
tion strategies for recording G'(r, Al) and for the adjacent 
transformation steps mentioned above. 

The transformation from propagation levels Al to 
imaginary time r requires the same weight factors as dis- 
cussed earlier for diagonal correlation functions: 



G(r,r)= ^ 



Ai=0 
L 



r \ / \ Al / X l-a;' 



AIJ \I3 



G(r,AO (19) 



J2 wiT,Al) G(r,AO. 



Ai=0 



where 



w{t, Al) 



Al / L-Al 



(20) 



Working in a fixed string size representetion with fixed 
L instead of varying n is more convenient because the bi- 
nomial weight prefactors are fixed during the entire simu- 
lation and can easily be calculated once at the beginning 
of the simulation. 

There are several possible ways to implement the 
recording of G(r, Al) measurements and the adjacent 
transformation to G(r, r). The easiest and at first glance 
fastest way simply writes all recorded G(r, Al) data into a 
two-dimensional array with dimensions Ns and L oc iVg/?. 
The transformation to G(r, t) can then be performed 
once at the end of the simulation. However, this method 
has two problems. A separate measurement has to be 
recorded each time the loop head steps up or down by one 
level on a world-line and whenever it traverses an interac- 
tion vertex. Recording all these measurements drastically 
slows down the loop update process. Second, for large 
systems (TVg « 5000) and low temperatures {(3 « 40) the 
two-dimensional array needed to store G(r, Al) contains 
about 1 billion elements and needs more memory than 
available on many computer systems. 

In order to overcome these problems one can replace 
the "brute force" recording of data on oH traversed (r, A^) 
points by a Monte Carlo sampling: in each loop-update 
a distance Al is chosen randomly, according to the prob- 
abilities in Eq. (p^), for each of the times r of interest. 
Measurements are then performed only at these Al and 
transformed directly into r. 

In our code we have adopted a third strategy: we per- 
form a/Z possible G(r, Al) measurements (thereby exploit- 
ing the fact that G(r, Al) is constant on the entire world- 
line fragment between tho adjacent vertices) and directly 
transform these into G(r, r) at the end of each loop up- 
date step. The transformation after each QMC update 
step is necessary to keep memory requirements low. 
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Simply applying Eq. (|T^) with its computationally 
expensive operations (divisions,powers, binomial coeffi- 
cients, large sums) would now cost by far too much com- 
putation time. Instead we remember that G{r, AZ) is 
composed of a relatively small number of intervals / = 
]AZi(/), AZ2(/)] with constant function value (Fig. 0b)). 
Therefore we can compute the contribution of an entire 
AZ-interval to G(r, t) in one step: 

G{r,T)=Y,G{r,I){W{T,M2(i))-W{T,M^ii))), (21) 



where W is the "integrated weight function" 



W{t,/^1) = ^w;(t, 



(22) 



m=0 



The AZ-range in which W{t^ Al) considerably differs from 
and 1 is determined by mean value and standard devi- 
ation of the binomial distribution w{t, Al) 



iAl)=L- 




(23) 
(24) 



Below {Al) — 5 OA; the integrated weight is zero, above 
(Al) + 5 OA; it is 1 (up to an error of less than 10~^). 
The remaining interval rarely contains more than fifty 
or hundred A/-points (see Fig. ^); these values can 
easily be stored after having been computed once for each 
r. Thus W{t, Al) can be calculated very rapidly with 
nothing but a couple of "cheap" elementary operations. 

For very large systems and very low temperatures the 
"relevant" AZ-ranges might become so large that it is 
unfavorable to store all needed W{t, Al) values - for ex- 
ample because accessing the large array W[Ti, Al] would 
caust too many cache misses. In this case one can 
store the coefficients of some interpolation functions for 
W{t, Al) instead of the function values themselves. Prac- 
tical tests have shown that dividing the relevant inter- 
val [{Al) — Scta/, (AZ) -|- 5crA/] into six sub-intervals with 
boundaries (AZ) — Soa;, (AZ) — 2.8 oa;, (AZ) — 1.3 oa/, 
(AZ), (AZ) -I- 1.3 OA/, (AZ) + 2.8 OA; and (AZ) + 5aA; and 
interpolating W in each sub- interval by a fifth-order poly- 
nomial is a good compromise between evaluation speed 
(about 15 elementary operations), storage requirements 
(36 floating point numbers for each r) and interpolation 
accuracy (better than 2.. 3 x lO"*"). 

The next transformation step, Fourier transform from 
G{r) to G{k), is a well known standard method that does 
not impose any fundamental problems. However, stan- 
dard Fast Fourier Transform (FFT) algorithms perform 
best if all G{k) values are to be calculated, whereas in 
practice one rarely needs all /c-values and is interested 
only in one Zc-point or in some special points of the Bril- 
louin zone, e.g. the point k = (tt, tt) and its immediate 



a) 



G(r,Al)^ 



Al 
Al 
Al 



b) 



G(r,Al) 



c) 




w(x,Al) 



Al 

2^w(x, m) 



FIG. 7. Transformation of Green's functions measurements 
from propagation level AZ to imaginary time r: the raw mea- 
surements recorded during loop update on different world-line 
segments (a) are combined into a single function G{r, Al) (b). 
For a given r G(r, r) could be computed by summing up all 
G(r,AZ) weighted with w(r,AZ) = (^^) (r)^'(i_ -)^-^' (c). 
A much more efficient way uses the "integrated weight func- 
tion" W{t, Al) = X^^Lq w{t, m) (d) to get the total contribu- 
tion of each range ]AZi, AZ2] in which G(r, AZ) is constant. In 
the example shown here G(r, r) is then simply aA + bB + cC. 



neighborhood. Then one can save a lot of computation 
time by not recurring to FFT but using optimized algo- 
rithms designed particularly for these cases. If we are 
interested in only one or a few fc-points we can use a 
simple Fourier transform to get {G(Zc, t)} from {G(r, r)} 
in 0{Ng ■ rife) operations (nfe is the number of Zc-points). 
Correlation functions C{k, r) can even be measured di- 
rectly in Zc-space, which also can be done in 0{Ns-nk) 
operations. For the case 1 <C rife ^ A^^ we have imple- 
mented a new Fourier transform algorilhm performing 
much better than FFT in this situation.E^ 

Unlike a Fourier transform a Laplace transform in gen- 
eral cannot be inverted . Therefore the last transition 
step from r to w is by far more complicated than the 
previous one from r to k. We use Maximum Entropy 
techniques developed, within the last years and refer to 
earlier publications.Ea 
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VII. EXAMPLE: SPIN CORRELATIONS OF THE 
2D HEISENBERG MODEL 

In this section we use our standard benchmark model 
- the sauqre lattice Heisenberg antiferromagnet - to test 
our method of Green's functions measurements for cor- 
rectness and numerical efficiency. To this purpose we 
calculate and compare the correlation functions 

{S'{T,k)S'{0,k)) and (25) 
{S+{T,k)S-iO,k)). (26) 

In the S'^-eigenbasis, which is normally used to span 
the model's Hilbert space, expression ( [25| ) is a time- 
dependent correlation function of two diagonal operators. 
Therefore the diagonal operator in Eq. (|25|) can be mea- 
sured using Eq. ( p^ ) without introducing changes in the 
world-lines and vertices defining the current state of the 
system. The Green's function Eq. (p6|), however, consists 
of two non-diagonal operators and can only be measured 
with our new method of recording general Green's func- 
tions that was described in section ^ Furthermore, at 
zero field h — both correlation functions are related via 

{S'{r,k)S%0,k)) = ^{S+{T,k)S-{0,k)), (27) 

so that the correctness of both estimators can be checked 
by directly comparing these two quantities. 

When working with the antiferromagnet wee need to 
keep in mind that in order to keep the exchange matrix 
elements S'^S~ and S~ positive we need to perform 
a gauge transformation, multiplying and S~ on one 
sublattice by —1. This gauge transformation does not 
affect any diagonal operator, but leads to a momentum 
shift of Q = (tt, tt), and ( p7| ) for the Green's function and 
Eq. ( p7| ) becomes 

(5^(t, k) S^O, k)} - i(^+(T, k + Q) S-{0, k + Q)). 

(28) 

The numerical data in Table ^ perfectly fulfil this equal- 
ity and hence demonstrate the correctness of our Green's 
functions measurements. 

In the simulation recorded in Table ^ we have calcu- 
lated (S*^ S^') and (5*+ S~) for all allowed fc-points on the 
path (0,0) ^ (7r,0) -> (7r,7r) -> (0,0). Table ^ shows 
a subset of these points in the vicinity of (tt, tt). The 
three tasks "performing updates", "measuring {S^ S^)" 
and "measuring {S^ 5*+)" contributed the following per- 
centages to overall computation time: 

performing updates : 18.8 % 
measuring (S*^ S*^) : 36.1 % 
measuring {S+ S~) : 45.1 % 

From this list and the measurement accuracies in Table 
we conclude that the highly non-trivial Green's functions 



TABLE V. Comparison of ^{S+{k + Q,t) S~ (k + Q,0)) 
and (5'=(fc, r) S'{k, 0)> for the 16 x 16-site 2D AF Heisenberg 
model at /3 = 16 and zero magnetic field. The table shows 
some fc- values around {n,n). 



k 


T 


(S^(fc,T) S^Cfc.O)) i{5'+{fc + Q,r) 5 (fc + Q,0)) 


/ 7r TV \ 
V 2 ' 2 / 







n 1 (\R(9 + 4) 




0.1 


0.00(01 ± 17) 


0.003(2 ± 3) 




0.5 


-0.00(36 ± 15) 


-0.000(4 ± 3) 


(3tt 3-k\ 
V 4 ' 4 / 











0.1 


01f74 ± 20) 


020f9 ± 5) 




0.5 


0.00(41 ± 19) 


0.000(4 ± 4) 


{n,n) 





11.3(35 ± 17) 


11.36(1 ±9) 




0.1 


10.4(04 ± 17) 


10.42(3 ± 9) 




0.5 


9.0(83 ± 17) 


9.09(3 ± 9) 







0.53(85 ± 23) 


0.542(2 ± 7) 




0.1 


0.06(31 ± 20) 


0.063(0 ± 6) 




0.5 


0.00(38 ± 17) 


0.000(0 ± 5) 







0.28(92 ± 23) 


0.287(6 ±5) 




0.1 


0.01(08 ±19) 


0.008(5 ± 4) 




0.5 


0.00(06 ± 17) 


0.000(3 ± 4) 



measurements lead to a slightly better accuracy than the 
direct (S'^(r, r)5'^(r', r')) measurements while consum- 
ing roughly the same amount of computer time as the 
latter. Measuring the Green's function is thus the pre- 
ferred method of determining also the diagonal real space 
dynamical correlation functions. 

VIII. SUMMARY 

Stochastic Series Expansion (SSE) together with the 
implementation tricks and Green's functions measure- 
ments described in this paper is a highly performant 
quantum Monte Carlo simulation technique allowing to 
access both static and dynamical properties of very large 
systems of thousands of sites and at very low tempera- 
tures. Compared to the loop-algorithm, which is slightly 
faster on big systems for some specific Hamiltonians, 
SSE has the advantages of not suffering from exponential 
slowing down in external fields; furthermore, SSE is more 
easily applicable to wide classes of Hamiltonians. 

We thank A. Sandvik for valuable discussions. This 
work was supported by DEC HA 1537/16-1,2 and KON- 
WIHR OOPCV ( A. D. ) and the Swiss National Science 
Foundation ( M. T. ). High performance calculations 
were performed at HLRZ Jiilich and LRZ Munich. 
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